I want your feedback to make the book better for you and other readers. If you find typos, errors, or places where the text may be improved, please let me know. The best ways to provide feedback are by GitHub or hypothes.is annotations.
Opening an issue or submitting a pull request on GitHub: https://github.com/isaactpetersen/Fantasy-Football-Analytics-Textbook
Adding an annotation using hypothes.is.
To add an annotation, select some text and then click the
symbol on the pop-up menu.
To see the annotations of others, click the
symbol in the upper right-hand corner of the page.
11 Mixed Models
11.1 Getting Started
11.1.1 Load Packages
11.1.2 Specify Package Options
11.1.3 Load Data
We created the player_stats_weekly.RData and player_stats_seasonal.RData objects in Section 3.21.3.
11.2 Overview of Mixed Models
We will discuss a modeling framework that goes by many terms, including mixed models, mixed-effects models, multilevel models, hierarchical linear models.
11.2.1 Ecological Fallacy
11.2.2 Simpson’s Paradox
11.3 Fantasy Points Per Season by Position, Age, and Experience
Code
player_stats_seasonal_offense_subset <- player_stats_seasonal_offense %>%
filter(position_group %in% c("QB","RB","WR","TE"))
player_stats_seasonal_offense_subset$position[which(player_stats_seasonal_offense_subset$position == "HB")] <- "RB"
player_stats_seasonal_kicking_subset <- player_stats_seasonal_kicking %>%
filter(position == "K")
player_stats_seasonal_offense_subset <- bind_rows(
player_stats_seasonal_offense_subset,
player_stats_seasonal_kicking_subset
)
player_stats_seasonal_offense_subset$player_idFactor <- factor(player_stats_seasonal_offense_subset$player_id)
player_stats_seasonal_offense_subset$positionFactor <- factor(player_stats_seasonal_offense_subset$position)Code
seasons17week <- 2001:2020
seasons18week <- 2021:max(nfl_depthCharts$season, na.rm = TRUE)
endOfSeasonDepthCharts <- nfl_depthCharts %>%
filter((season %in% seasons17week & week == 18) | (season %in% seasons18week & week == 19)) # get end-of-season depth charts
qb1s <- endOfSeasonDepthCharts %>%
filter(position == "QB", depth_team == 1)
fb1s <- endOfSeasonDepthCharts %>%
filter(position == "FB", depth_team == 1)
k1s <- endOfSeasonDepthCharts %>%
filter(position == "K", depth_team == 1)
rb1s <- endOfSeasonDepthCharts %>%
filter(position == "RB", depth_team == 1)
wr1s <- endOfSeasonDepthCharts %>%
filter(position == "WR", depth_team == 1)
te1s <- endOfSeasonDepthCharts %>%
filter(position == "TE", depth_team == 1)
player_stats_seasonal_offense_subsetDepth <- player_stats_seasonal_offense_subset %>%
filter(player_id %in% c(
qb1s$gsis_id,
fb1s$gsis_id,
k1s$gsis_id,
rb1s$gsis_id,
wr1s$gsis_id,
te1s$gsis_id
))Create a newdata object for generating the plots of model-implied fantasy points by age and position:
Code
pointsPerSeason_positionAge_newData <- expand.grid(
positionFactor = factor(c("FB","QB","RB","TE","WR")), #,"K"
age = seq(from = 20, to = 40, length.out = 10000)
)
pointsPerSeason_positionAge_newData$ageCentered20 <- pointsPerSeason_positionAge_newData$age - 20
pointsPerSeason_positionAge_newData$ageCentered20Quadratic <- pointsPerSeason_positionAge_newData$ageCentered20 ^ 2
pointsPerSeason_positionAge_newData$years_of_experience <- floor(pointsPerSeason_positionAge_newData$age - 22) # assuming that most players start at age 22 (i.e., rookie year) and thus have 1 year of experience at age 23
pointsPerSeason_positionAge_newData$years_of_experience[which(pointsPerSeason_positionAge_newData$years_of_experience < 0)] <- 0Create an object with complete cases for generating the plots of individuals’ -implied fantasy points by age and position:
11.3.1 Scatterplots of Fantasy Points by Age and Position
Scatterplots are a helpful tool for quickly examining the association between two variables. However, scatterplots—as well as correlation and multiple regression—can hide meaningful associations that differ across units of analysis.
11.3.1.1 Quarterbacks
A scatterplot of Quarterbacks’ fantasy points by age is in Figure 11.1.
Code
plot_scatterplotFantasyPointsByAgeQB <- ggplot(
data = player_stats_seasonal_offense_subset %>%
filter(position == "QB") %>%
mutate(
age = round(age, 2),
fantasy_points = round(fantasy_points, 2)
),
mapping = aes(
x = age,
y = fantasy_points,
color = player_id)) +
geom_point(
aes(
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
)) +
geom_smooth(
mapping = aes(
x = age,
y = fantasy_points),
inherit.aes = FALSE
) +
scale_color_viridis(discrete = TRUE) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Quarterbacks"
) +
theme_classic() +
theme(legend.position = "none")
ggplotly(
plot_scatterplotFantasyPointsByAgeQB,
tooltip = c("age","fantasy_points","text","label"))Based on the scatterplot (and the bivariate association below), Quarterbacks’ fantasy points appear to increase with age.
Code
Pearson's product-moment correlation
data: age and fantasy_points
t = 3.2358, df = 1051, p-value = 0.001251
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.03914186 0.15877861
sample estimates:
cor
0.09931915
11.3.1.2 Fullbacks
A scatterplot of Fullbacks’ fantasy points by age is in Figure 11.2.
Code
plot_scatterplotFantasyPointsByAgeFB <- ggplot(
data = player_stats_seasonal_offense_subset %>%
filter(position == "FB") %>%
mutate(
age = round(age, 2),
fantasy_points = round(fantasy_points, 2)
),
mapping = aes(
x = age,
y = fantasy_points,
color = player_id)) +
geom_point(
aes(
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
)) +
geom_smooth(
mapping = aes(
x = age,
y = fantasy_points),
inherit.aes = FALSE
) +
scale_color_viridis(discrete = TRUE) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Fullbacks"
) +
theme_classic() +
theme(legend.position = "none")
ggplotly(
plot_scatterplotFantasyPointsByAgeFB,
tooltip = c("age","fantasy_points","text","label"))Based on the scatterplot, Fullbacks’ fantasy points appear to be relatively stable across ages.
11.3.1.3 Running Backs
A scatterplot of Running Backs’ fantasy points by age is in Figure 11.3.
Code
plot_scatterplotFantasyPointsByAgeRB <- ggplot(
data = player_stats_seasonal_offense_subset %>%
filter(position == "RB") %>%
mutate(
age = round(age, 2),
fantasy_points = round(fantasy_points, 2)
),
mapping = aes(
x = age,
y = fantasy_points,
color = player_id)) +
geom_point(
aes(
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
)) +
geom_smooth(
mapping = aes(
x = age,
y = fantasy_points),
inherit.aes = FALSE
) +
scale_color_viridis(discrete = TRUE) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Running Backs"
) +
theme_classic() +
theme(legend.position = "none")
ggplotly(
plot_scatterplotFantasyPointsByAgeRB,
tooltip = c("age","fantasy_points","text","label"))Based on the scatterplot, Running Backs’ fantasy points appear to be relatively stable across ages.
11.3.1.4 Wide Receivers
A scatterplot of Wide Receivers’ fantasy points by age is in Figure 11.4.
Code
plot_scatterplotFantasyPointsByAgeWR <- ggplot(
data = player_stats_seasonal_offense_subset %>%
filter(position == "WR") %>%
mutate(
age = round(age, 2),
fantasy_points = round(fantasy_points, 2)
),
mapping = aes(
x = age,
y = fantasy_points,
color = player_id)) +
geom_point(
aes(
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
)) +
geom_smooth(
mapping = aes(
x = age,
y = fantasy_points),
inherit.aes = FALSE
) +
scale_color_viridis(discrete = TRUE) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Wide Receivers"
) +
theme_classic() +
theme(legend.position = "none")
ggplotly(
plot_scatterplotFantasyPointsByAgeWR,
tooltip = c("age","fantasy_points","text","label"))Based on the scatterplot, Wide Receivers’ fantasy points appear to be relatively stable across ages.
11.3.1.5 Tight Ends
A scatterplot of Tight Ends’ fantasy points by age is in Figure 11.5.
Code
plot_scatterplotFantasyPointsByAgeTE <- ggplot(
data = player_stats_seasonal_offense_subset %>%
filter(position == "TE") %>%
mutate(
age = round(age, 2),
fantasy_points = round(fantasy_points, 2)
),
mapping = aes(
x = age,
y = fantasy_points,
color = player_id)) +
geom_point(
aes(
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
)) +
geom_smooth(
mapping = aes(
x = age,
y = fantasy_points),
inherit.aes = FALSE
) +
scale_color_viridis(discrete = TRUE) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Tight Ends"
) +
theme_classic() +
theme(legend.position = "none")
ggplotly(
plot_scatterplotFantasyPointsByAgeTE,
tooltip = c("age","fantasy_points","text","label"))Based on the scatterplot (and the bivariate association below), Tight Ends’ fantasy points appear to increase with age.
Code
Pearson's product-moment correlation
data: age and fantasy_points
t = 5.3884, df = 1341, p-value = 8.388e-08
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.09280906 0.19753022
sample estimates:
cor
0.1455774
11.3.2 Plots of Raw Trajectories of Fantasy Points By Age and Player
Scatterplots can be helpful for quickly visualizing the association between two variables. However, as mentioned earlier, scatterplots can hide the association between variables at different units of analysis. For instance, consider that we are trying to predict how a player will perform based on their age. We are interested not only in what the association is between age and fantasy points between players (i.e., a between-person association). We are also interested in what the association is between age and fantasy points within a given player (and within each player; i.e., a within-individual association). Arguably, the within-individual association between age and fantasy points is more relevant to the prediction of performance than the association between age and fantasy points between players. Assuming that the between-player association between age and fantasy points is the same as the within-player association when it is not is an example of the ecological fallacy.
Below, we depict players’ raw trajectories of fantasy points as a function of age. These are known as spaghetti plots. By examining the trajectory for each player, we can get a better understanding of hor performance changes (within an individual) as a function of age.
11.3.2.1 Quarterbacks
A plot of Quarterbacks’ raw fantasy points data by age is in Figure 11.6.
Code
plot_rawFantasyPointsByAgeQB <- ggplot(
data = player_stats_seasonal_offense_subset %>%
filter(position == "QB") %>%
mutate(
age = round(age, 2),
fantasy_points = round(fantasy_points, 2)
),
mapping = aes(
x = age,
y = fantasy_points,
color = player_id)) +
geom_line(
aes(
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
)) +
scale_color_viridis(discrete = TRUE) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Quarterbacks"
) +
theme_classic() +
theme(legend.position = "none")
ggplotly(
plot_rawFantasyPointsByAgeQB,
tooltip = c("age","fantasy_points","text","label"))11.3.2.2 Fullbacks
A plot of Fullbacks’ raw fantasy points data by age is in Figure 11.7.
Code
plot_rawFantasyPointsByAgeFB <- ggplot(
data = player_stats_seasonal_offense_subset %>%
filter(position == "FB") %>%
mutate(
age = round(age, 2),
fantasy_points = round(fantasy_points, 2)
),
mapping = aes(
x = age,
y = fantasy_points,
color = player_id)) +
geom_line(
aes(
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
)) +
scale_color_viridis(discrete = TRUE) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Fullbacks"
) +
theme_classic() +
theme(legend.position = "none")
ggplotly(
plot_rawFantasyPointsByAgeFB,
tooltip = c("age","fantasy_points","text","label"))11.3.2.3 Running Backs
A plot of Running Backs’ raw fantasy points data by age is in Figure 11.8.
Code
plot_rawFantasyPointsByAgeRB <- ggplot(
data = player_stats_seasonal_offense_subset %>%
filter(position == "RB") %>%
mutate(
age = round(age, 2),
fantasy_points = round(fantasy_points, 2)
),
mapping = aes(
x = age,
y = fantasy_points,
color = player_id)) +
geom_line(
aes(
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
)) +
scale_color_viridis(discrete = TRUE) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Running Backs"
) +
theme_classic() +
theme(legend.position = "none")
ggplotly(
plot_rawFantasyPointsByAgeRB,
tooltip = c("age","fantasy_points","text","label"))11.3.2.4 Wide Receivers
A plot of Wide Receivers’ raw fantasy points data by age is in Figure 11.9.
Code
plot_rawFantasyPointsByAgeWR <- ggplot(
data = player_stats_seasonal_offense_subset %>%
filter(position == "WR") %>%
mutate(
age = round(age, 2),
fantasy_points = round(fantasy_points, 2)
),
mapping = aes(
x = age,
y = fantasy_points,
color = player_id)) +
geom_line(
aes(
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
)) +
scale_color_viridis(discrete = TRUE) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Wide Receivers"
) +
theme_classic() +
theme(legend.position = "none")
ggplotly(
plot_rawFantasyPointsByAgeWR,
tooltip = c("age","fantasy_points","text","label"))11.3.2.5 Tight Ends
A plot of Tight Ends’ raw fantasy points data by age is in Figure 11.10.
Code
plot_rawFantasyPointsByAgeTE <- ggplot(
data = player_stats_seasonal_offense_subset %>%
filter(position == "TE") %>%
mutate(
age = round(age, 2),
fantasy_points = round(fantasy_points, 2)
),
mapping = aes(
x = age,
y = fantasy_points,
color = player_id)) +
geom_line(
aes(
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
)) +
scale_color_viridis(discrete = TRUE) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Tight Ends"
) +
theme_classic() +
theme(legend.position = "none")
ggplotly(
plot_rawFantasyPointsByAgeTE,
tooltip = c("age","fantasy_points","text","label"))11.3.3 Linear Regression Models
11.3.3.1 Null Model
Code
Call:
lm(formula = fantasy_points ~ 1, data = player_stats_seasonal_offense_subset,
na.action = "na.exclude")
Residuals:
Min 1Q Median 3Q Max
-68.13 -53.35 -29.35 28.75 418.61
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 60.8540 0.6321 96.27 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 73.53 on 13531 degrees of freedom
(1043 observations deleted due to missingness)
[1] 0
[1] 154719.7
[1] 154719.7
Code
pointsPerSeason_positionAge_newData$fantasyPoints_nullModel <- predict(
object = pointsPerSeason_nullModel,
newdata = pointsPerSeason_positionAge_newData
)
ggplot2::ggplot(
data = pointsPerSeason_positionAge_newData,
mapping = aes(
x = age,
y = fantasyPoints_nullModel
)
) +
geom_line(linewidth = 2) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age and Position",
subtitle = "Null Model"
) +
theme_classic()
11.3.3.2 Linear Model
Code
Call:
lm(formula = fantasy_points ~ positionFactor + ageCentered20 +
positionFactor:ageCentered20, data = player_stats_seasonal_offense_subset,
na.action = "na.exclude")
Residuals:
Min 1Q Median 3Q Max
-163.81 -51.22 -16.91 39.68 357.82
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.0884 14.3594 0.772 0.44002
positionFactorQB 95.0782 15.2244 6.245 4.51e-10 ***
positionFactorRB 68.3543 15.0568 4.540 5.73e-06 ***
positionFactorTE 14.8749 15.1809 0.980 0.32720
positionFactorWR 42.1981 14.8233 2.847 0.00443 **
ageCentered20 0.7358 1.9891 0.370 0.71144
positionFactorQB:ageCentered20 2.1806 2.0611 1.058 0.29012
positionFactorRB:ageCentered20 -1.1383 2.1181 -0.537 0.59102
positionFactorTE:ageCentered20 1.3905 2.1044 0.661 0.50879
positionFactorWR:ageCentered20 1.1822 2.0663 0.572 0.56726
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 75.72 on 6435 degrees of freedom
(8130 observations deleted due to missingness)
Multiple R-squared: 0.1423, Adjusted R-squared: 0.1411
F-statistic: 118.6 on 9 and 6435 DF, p-value: < 2.2e-16
[1] 0.1422979
[1] 74077.77
[1] 74077.81
Code
pointsPerSeason_positionAge_newData$fantasyPoints_linearRegression <- predict(
object = pointsPerSeason_linearRegression,
newdata = pointsPerSeason_positionAge_newData
)
ggplot2::ggplot(
data = pointsPerSeason_positionAge_newData,
mapping = aes(
x = age,
y = fantasyPoints_linearRegression,
color = positionFactor
)
) +
geom_line(linewidth = 2) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age and Position",
subtitle = "Linear Regression Model"
) +
theme_classic()
11.3.3.3 Quadratic Model
Code
pointsPerSeason_quadraticRegression <- lm(
fantasy_points ~ positionFactor + ageCentered20 + ageCentered20Quadratic + positionFactor:ageCentered20 + positionFactor:ageCentered20Quadratic,
data = player_stats_seasonal_offense_subset,
na.action = "na.exclude"
)
summary(pointsPerSeason_quadraticRegression)
Call:
lm(formula = fantasy_points ~ positionFactor + ageCentered20 +
ageCentered20Quadratic + positionFactor:ageCentered20 + positionFactor:ageCentered20Quadratic,
data = player_stats_seasonal_offense_subset, na.action = "na.exclude")
Residuals:
Min 1Q Median 3Q Max
-197.10 -50.92 -17.12 39.78 357.46
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.2343 33.8844 -0.095 0.9240
positionFactorQB 145.6404 35.1887 4.139 3.54e-05
positionFactorRB 83.7797 34.9318 2.398 0.0165
positionFactorTE 25.8075 35.1789 0.734 0.4632
positionFactorWR 51.6417 34.6205 1.492 0.1358
ageCentered20 5.1428 9.6526 0.533 0.5942
ageCentered20Quadratic -0.2954 0.6331 -0.467 0.6408
positionFactorQB:ageCentered20 -11.4153 9.8800 -1.155 0.2480
positionFactorRB:ageCentered20 -5.9440 10.0226 -0.593 0.5532
positionFactorTE:ageCentered20 -1.9854 9.9837 -0.199 0.8424
positionFactorWR:ageCentered20 -1.5417 9.8933 -0.156 0.8762
positionFactorQB:ageCentered20Quadratic 0.7527 0.6412 1.174 0.2404
positionFactorRB:ageCentered20Quadratic 0.3247 0.6613 0.491 0.6235
positionFactorTE:ageCentered20Quadratic 0.2308 0.6515 0.354 0.7232
positionFactorWR:ageCentered20Quadratic 0.1767 0.6501 0.272 0.7858
(Intercept)
positionFactorQB ***
positionFactorRB *
positionFactorTE
positionFactorWR
ageCentered20
ageCentered20Quadratic
positionFactorQB:ageCentered20
positionFactorRB:ageCentered20
positionFactorTE:ageCentered20
positionFactorWR:ageCentered20
positionFactorQB:ageCentered20Quadratic
positionFactorRB:ageCentered20Quadratic
positionFactorTE:ageCentered20Quadratic
positionFactorWR:ageCentered20Quadratic
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 75.62 on 6430 degrees of freedom
(8130 observations deleted due to missingness)
Multiple R-squared: 0.1451, Adjusted R-squared: 0.1433
F-statistic: 77.98 on 14 and 6430 DF, p-value: < 2.2e-16
[1] 0.1451436
[1] 74066.35
[1] 74066.44
Code
pointsPerSeason_positionAge_newData$fantasyPoints_quadraticRegression <- predict(
object = pointsPerSeason_quadraticRegression,
newdata = pointsPerSeason_positionAge_newData
)
ggplot2::ggplot(
data = pointsPerSeason_positionAge_newData,
mapping = aes(
x = age,
y = fantasyPoints_quadraticRegression,
color = positionFactor
)
) +
geom_line(linewidth = 2) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age and Position",
subtitle = "Quadratic Regression Model"
) +
theme_classic()
11.3.3.4 Compare Models
Code
Code
dAIC df
quadraticRegression 0.0 16
linearRegression 11.4 11
nullModel 80653.3 2
[1] 0
[1] 0.1422979
[1] 0.1451436
[1] 73167060
[1] 36895690
[1] 36773276
'log Lik.' -77357.85 (df=2)
'log Lik.' -37027.89 (df=11)
'log Lik.' -37017.18 (df=16)
11.3.4 Mixed Models
By accounting for which player each observation comes from using mixed models, we can examine the association between age and fantasy points in a more meaningful way, without violating the assumption in multiple regression that the observations are independent (i.e., that the residuals are uncorrelated).
11.3.4.1 Random Intercepts Model
Code
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula: fantasy_points ~ 1 + (1 | player_idFactor)
Data: player_stats_seasonal_offense_subset
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
148044.0 148066.6 -74019.0 148038.0 13529
Scaled residuals:
Min 1Q Median 3Q Max
-6.5321 -0.4328 -0.1682 0.3538 5.5810
Random effects:
Groups Name Variance Std.Dev.
player_idFactor (Intercept) 2197 46.87
Residual 2335 48.32
Number of obs: 13532, groups: player_idFactor, 3358
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 44.6942 0.9595 3834.9436 46.58 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R2m R2c
[1,] 0 0.4847453
[1] 148044
numeric(0)
Code
pointsPerSeason_positionAge_newData$fantasyPoints_randomIntercepts <- predict(
object = pointsPerSeason_randomIntercepts,
newdata = pointsPerSeason_positionAge_newData,
re.form = NA
)
ggplot2::ggplot(
data = pointsPerSeason_positionAge_newData,
mapping = aes(
x = age,
y = fantasyPoints_randomIntercepts
)
) +
geom_line(linewidth = 2) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age",
subtitle = "Random Intercepts Model"
) +
theme_classic()
11.3.4.2 Random Intercepts Model with Position as Fixed-Effect Predictor
Code
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula: fantasy_points ~ positionFactor + (1 | player_idFactor)
Data: player_stats_seasonal_offense_subset
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
147766.3 147818.9 -73876.2 147752.3 13525
Scaled residuals:
Min 1Q Median 3Q Max
-6.5274 -0.4477 -0.1408 0.3593 5.5407
Random effects:
Groups Name Variance Std.Dev.
player_idFactor (Intercept) 1940 44.04
Residual 2336 48.33
Number of obs: 13532, groups: player_idFactor, 3358
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 15.536 4.447 3449.114 3.494 0.000482 ***
positionFactorQB 61.285 5.178 3470.831 11.836 < 2e-16 ***
positionFactorRB 37.890 4.787 3499.181 7.915 3.28e-15 ***
positionFactorTE 9.873 4.903 3499.455 2.014 0.044136 *
positionFactorWR 27.271 4.693 3490.629 5.811 6.75e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) pstFQB pstFRB pstFTE
postnFctrQB -0.859
postnFctrRB -0.929 0.798
postnFctrTE -0.907 0.779 0.842
postnFctrWR -0.948 0.814 0.880 0.859
R2m R2c
[1,] 0.06139709 0.487171
positionFactor emmean SE df lower.CL upper.CL
FB 15.5 4.45 2896 6.81 24.3
QB 76.8 2.65 2970 71.62 82.0
RB 53.4 1.77 3241 49.95 56.9
TE 25.4 2.07 3159 21.35 29.5
WR 42.8 1.50 3284 39.86 45.8
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
[1] 147766.3
Code
pointsPerSeason_positionAge_newData$fantasyPoints_position <- predict(
object = pointsPerSeason_position,
newdata = pointsPerSeason_positionAge_newData,
re.form = NA
)
ggplot2::ggplot(
data = pointsPerSeason_positionAge_newData,
mapping = aes(
x = age,
y = fantasyPoints_position,
color = positionFactor
)
) +
geom_line(linewidth = 2) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age and Position",
subtitle = "Random Intercepts Model With Position as Fixed-Effect Predictor"
) +
theme_classic()
11.3.4.3 Identify the Best-Fitting Functional Form of Age
11.3.4.3.1 Linear Models
Code
Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula:
fantasy_points ~ positionFactor + ageCentered20 + (1 | player_idFactor)
Data: player_stats_seasonal_offense_subset
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
71242.9 71297.1 -35613.5 71226.9 6437
Scaled residuals:
Min 1Q Median 3Q Max
-6.3406 -0.4440 -0.1160 0.3999 4.5152
Random effects:
Groups Name Variance Std.Dev.
player_idFactor (Intercept) 2555 50.55
Residual 2682 51.79
Number of obs: 6445, groups: player_idFactor, 1292
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 22.102 8.801 1433.812 2.511 0.012139 *
positionFactorQB 89.929 9.765 1335.477 9.209 < 2e-16 ***
positionFactorRB 47.609 9.254 1353.881 5.145 3.07e-07 ***
positionFactorTE 16.785 9.348 1352.423 1.796 0.072779 .
positionFactorWR 34.404 9.045 1355.844 3.804 0.000149 ***
ageCentered20 -1.495 0.249 5966.317 -6.005 2.02e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) pstFQB pstFRB pstFTE pstFWR
postnFctrQB -0.869
postnFctrRB -0.926 0.829
postnFctrTE -0.913 0.821 0.867
postnFctrWR -0.947 0.848 0.896 0.887
ageCentrd20 -0.180 -0.020 0.033 0.012 0.027
R2m R2c
[1,] 0.09922355 0.5386768
positionFactor emmean SE df lower.CL upper.CL
FB 12.5 8.67 1226 -4.49 29.6
QB 102.5 4.53 1173 93.58 111.3
RB 60.1 3.28 1266 53.72 66.6
TE 29.3 3.53 1253 22.39 36.2
WR 46.9 2.63 1310 41.79 52.1
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
ageCentered20 emmean SE df lower.CL upper.CL
6.4 50.3 2.24 1225 45.9 54.7
Results are averaged over the levels of: positionFactor
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
[1] 71242.93
Code
pointsPerSeason_positionAge_newData$fantasyPoints_fixedLinearSlopes <- predict(
object = pointsPerSeason_positionAgeFixedLinearSlopes,
newdata = pointsPerSeason_positionAge_newData,
re.form = NA
)
ggplot2::ggplot(
data = pointsPerSeason_positionAge_newData,
mapping = aes(
x = age,
y = fantasyPoints_fixedLinearSlopes,
color = positionFactor
)
) +
geom_line(linewidth = 2) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age and Position",
subtitle = "Mixed Model with Random Intercepts and Fixed Linear Slopes"
) +
theme_classic()
Code
pointsPerSeason_positionAgeRandomLinearSlopes <- lmerTest::lmer(
fantasy_points ~ positionFactor + ageCentered20 + (1 + ageCentered20 | player_idFactor),
data = player_stats_seasonal_offense_subset,
REML = FALSE,
control = lmerControl(optimizer = "bobyqa")
)
summary(pointsPerSeason_positionAgeRandomLinearSlopes)Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula:
fantasy_points ~ positionFactor + ageCentered20 + (1 + ageCentered20 |
player_idFactor)
Data: player_stats_seasonal_offense_subset
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
71019.5 71087.2 -35499.8 70999.5 6435
Scaled residuals:
Min 1Q Median 3Q Max
-6.4021 -0.4324 -0.1166 0.3901 4.7118
Random effects:
Groups Name Variance Std.Dev. Corr
player_idFactor (Intercept) 3483.32 59.020
ageCentered20 28.04 5.296 -0.52
Residual 2430.63 49.301
Number of obs: 6445, groups: player_idFactor, 1292
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 24.9438 8.8723 1459.3714 2.811 0.004998 **
positionFactorQB 88.9525 9.7308 1297.2928 9.141 < 2e-16 ***
positionFactorRB 47.3942 9.2153 1322.7062 5.143 3.11e-07 ***
positionFactorTE 16.3712 9.3043 1314.3615 1.760 0.078720 .
positionFactorWR 34.2219 9.0054 1318.7078 3.800 0.000151 ***
ageCentered20 -2.0167 0.3462 716.2648 -5.826 8.59e-09 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) pstFQB pstFRB pstFTE pstFWR
postnFctrQB -0.860
postnFctrRB -0.918 0.828
postnFctrTE -0.904 0.820 0.867
postnFctrWR -0.938 0.848 0.896 0.887
ageCentrd20 -0.237 -0.001 0.039 0.017 0.033
R2m R2c
[1,] 0.09797596 0.5835368
positionFactor emmean SE df lower.CL upper.CL
FB 12.0 8.64 1189 -4.92 29.0
QB 101.0 4.53 1151 92.10 109.9
RB 59.4 3.28 1288 52.99 65.9
TE 28.4 3.52 1242 21.50 35.3
WR 46.3 2.63 1321 41.10 51.4
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
ageCentered20 emmean SE df lower.CL upper.CL
6.4 49.4 2.25 1168 45 53.8
Results are averaged over the levels of: positionFactor
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
[1] 71019.5
Code
pointsPerSeason_positionAge_newData$fantasyPoints_randomLinearSlopes <- predict(
object = pointsPerSeason_positionAgeRandomLinearSlopes,
newdata = pointsPerSeason_positionAge_newData,
re.form = NA
)
ggplot2::ggplot(
data = pointsPerSeason_positionAge_newData,
mapping = aes(
x = age,
y = fantasyPoints_randomLinearSlopes,
color = positionFactor
)
) +
geom_line(linewidth = 2) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age and Position",
subtitle = "Mixed Model with Random Intercepts and Random Linear Slopes"
) +
theme_classic()
11.3.4.3.2 Quadratic Models
Code
pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopes <- lmerTest::lmer(
fantasy_points ~ positionFactor + ageCentered20 + ageCentered20Quadratic + (1 + ageCentered20 | player_idFactor),
data = player_stats_seasonal_offense_subset,
REML = FALSE,
control = lmerControl(optimizer = "bobyqa")
)
summary(pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopes)Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula:
fantasy_points ~ positionFactor + ageCentered20 + ageCentered20Quadratic +
(1 + ageCentered20 | player_idFactor)
Data: player_stats_seasonal_offense_subset
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
70665.5 70740.0 -35321.7 70643.5 6434
Scaled residuals:
Min 1Q Median 3Q Max
-6.6477 -0.4430 -0.1063 0.3817 4.7823
Random effects:
Groups Name Variance Std.Dev. Corr
player_idFactor (Intercept) 4050.7 63.645
ageCentered20 61.2 7.823 -0.60
Residual 2157.5 46.449
Number of obs: 6445, groups: player_idFactor, 1292
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) -24.02735 9.34231 1728.76809 -2.572 0.0102 *
positionFactorQB 88.81757 9.85929 1344.37099 9.009 < 2e-16 ***
positionFactorRB 51.31903 9.32501 1359.21221 5.503 4.45e-08 ***
positionFactorTE 16.83493 9.41871 1353.13565 1.787 0.0741 .
positionFactorWR 36.99633 9.11732 1357.91920 4.058 5.24e-05 ***
ageCentered20 14.28911 0.89964 4453.56443 15.883 < 2e-16 ***
ageCentered20Quadratic -1.24457 0.05996 3926.72121 -20.756 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) pstFQB pstFRB pstFTE pstFWR agCn20
postnFctrQB -0.828
postnFctrRB -0.889 0.830
postnFctrTE -0.872 0.822 0.869
postnFctrWR -0.907 0.849 0.899 0.889
ageCentrd20 -0.340 -0.004 0.032 0.011 0.027
agCntrd20Qd 0.259 0.007 -0.017 -0.005 -0.014 -0.893
R2m R2c
[1,] 0.1655593 0.6746525
positionFactor emmean SE df lower.CL upper.CL
FB 3.38 8.77 1236 -13.8 20.6
QB 92.19 4.62 1223 83.1 101.3
RB 54.70 3.33 1334 48.2 61.2
TE 20.21 3.58 1293 13.2 27.2
WR 40.37 2.68 1386 35.1 45.6
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
ageCentered20 emmean SE df lower.CL upper.CL
6.4 42.2 2.33 1234 37.6 46.7
Results are averaged over the levels of: positionFactor
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Code
ageCentered20Quadratic emmean SE df lower.CL upper.CL
51.5 42.2 2.33 1234 37.6 46.7
Results are averaged over the levels of: positionFactor
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
[1] 70665.47
Code
pointsPerSeason_positionAge_newData$fantasyPoints_randomLinearFixedQuadraticSlopes <- predict(
object = pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopes,
newdata = pointsPerSeason_positionAge_newData,
re.form = NA
)
ggplot2::ggplot(
data = pointsPerSeason_positionAge_newData,
mapping = aes(
x = age,
y = fantasyPoints_randomLinearFixedQuadraticSlopes,
color = positionFactor
)
) +
geom_line(linewidth = 2) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age and Position",
subtitle = "Mixed Model with Random Intercepts, Random Linear Slopes, and Fixed Quadratic Slopes"
) +
theme_classic()
Code
pointsPerSeason_positionAgeRandomLinearRandomQuadraticSlopes <- lmerTest::lmer(
fantasy_points ~ positionFactor + ageCentered20 + ageCentered20Quadratic + (1 + ageCentered20 + ageCentered20Quadratic | player_idFactor),
data = player_stats_seasonal_offense_subset,
REML = FALSE,
control = lmerControl(optimizer = "bobyqa")
)Code
pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteraction <- lmerTest::lmer(
fantasy_points ~ positionFactor + ageCentered20 + ageCentered20Quadratic + positionFactor:ageCentered20 + positionFactor:ageCentered20Quadratic + (1 + ageCentered20 | player_idFactor),
data = player_stats_seasonal_offense_subset,
REML = FALSE,
control = lmerControl(optimizer = "bobyqa")
)
summary(pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteraction)Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula:
fantasy_points ~ positionFactor + ageCentered20 + ageCentered20Quadratic +
positionFactor:ageCentered20 + positionFactor:ageCentered20Quadratic +
(1 + ageCentered20 | player_idFactor)
Data: player_stats_seasonal_offense_subset
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
70621.8 70750.5 -35291.9 70583.8 6426
Scaled residuals:
Min 1Q Median 3Q Max
-6.7812 -0.4499 -0.1064 0.3837 4.7504
Random effects:
Groups Name Variance Std.Dev. Corr
player_idFactor (Intercept) 3874.28 62.24
ageCentered20 56.85 7.54 -0.57
Residual 2136.45 46.22
Number of obs: 6445, groups: player_idFactor, 1292
Fixed effects:
Estimate Std. Error df
(Intercept) -23.02774 26.67799 4427.31960
positionFactorQB 65.80803 28.07816 4194.19857
positionFactorRB 62.12817 27.68236 4329.35668
positionFactorTE 20.05015 27.89862 4326.89022
positionFactorWR 29.59854 27.35775 4370.07994
ageCentered20 11.79540 7.43993 4959.04172
ageCentered20Quadratic -0.87148 0.50915 4523.63824
positionFactorQB:ageCentered20 7.19239 7.65334 4914.64310
positionFactorRB:ageCentered20 1.31528 7.76909 4927.20674
positionFactorTE:ageCentered20 -0.85010 7.73970 4938.17986
positionFactorWR:ageCentered20 5.36603 7.64413 4949.66508
positionFactorQB:ageCentered20Quadratic -0.49079 0.51719 4542.27913
positionFactorRB:ageCentered20Quadratic -0.61350 0.53800 4439.78564
positionFactorTE:ageCentered20Quadratic 0.06118 0.52899 4492.88066
positionFactorWR:ageCentered20Quadratic -0.65597 0.52521 4496.71285
t value Pr(>|t|)
(Intercept) -0.863 0.3881
positionFactorQB 2.344 0.0191 *
positionFactorRB 2.244 0.0249 *
positionFactorTE 0.719 0.4724
positionFactorWR 1.082 0.2794
ageCentered20 1.585 0.1129
ageCentered20Quadratic -1.712 0.0870 .
positionFactorQB:ageCentered20 0.940 0.3474
positionFactorRB:ageCentered20 0.169 0.8656
positionFactorTE:ageCentered20 -0.110 0.9125
positionFactorWR:ageCentered20 0.702 0.4827
positionFactorQB:ageCentered20Quadratic -0.949 0.3427
positionFactorRB:ageCentered20Quadratic -1.140 0.2542
positionFactorTE:ageCentered20Quadratic 0.116 0.9079
positionFactorWR:ageCentered20Quadratic -1.249 0.2117
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R2m R2c
[1,] 0.1582759 0.6751097
Code
positionFactor emmean SE df lower.CL upper.CL
FB 7.62 9.43 1344 -10.9 26.1
QB 94.20 4.73 1111 84.9 103.5
RB 46.59 3.71 1375 39.3 53.9
TE 25.37 3.78 1247 18.0 32.8
WR 37.80 2.89 1333 32.1 43.5
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Code
ageCentered20 emmean SE df lower.CL upper.CL
6.4 42.3 2.43 1301 37.5 47.1
Results are averaged over the levels of: positionFactor
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Code
ageCentered20Quadratic emmean SE df lower.CL upper.CL
51.5 42.2 2.33 1234 37.6 46.7
Results are averaged over the levels of: positionFactor
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
[1] 70621.85
Code
pointsPerSeason_positionAge_newData$fantasyPoints_randomLinearFixedQuadraticSlopesInteraction <- predict(
object = pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteraction,
newdata = pointsPerSeason_positionAge_newData,
re.form = NA
)
ggplot2::ggplot(
data = pointsPerSeason_positionAge_newData,
mapping = aes(
x = age,
y = fantasyPoints_randomLinearFixedQuadraticSlopesInteraction,
color = positionFactor
)
) +
geom_line(linewidth = 2) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age and Position",
subtitle = "Mixed Model with Random Intercepts, Random Linear Slopes, and Fixed Quadratic Slopes in Interaction With Position"
) +
theme_classic()
Code
pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteractionExperience <- lmerTest::lmer(
fantasy_points ~ positionFactor + ageCentered20 + ageCentered20Quadratic + positionFactor:ageCentered20 + positionFactor:ageCentered20Quadratic + years_of_experience + (1 + ageCentered20 | player_idFactor),
data = player_stats_seasonal_offense_subset,
REML = FALSE,
control = lmerControl(optimizer = "bobyqa")
)
summary(pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteractionExperience)Linear mixed model fit by maximum likelihood . t-tests use Satterthwaite's
method [lmerModLmerTest]
Formula:
fantasy_points ~ positionFactor + ageCentered20 + ageCentered20Quadratic +
positionFactor:ageCentered20 + positionFactor:ageCentered20Quadratic +
years_of_experience + (1 + ageCentered20 | player_idFactor)
Data: player_stats_seasonal_offense_subset
Control: lmerControl(optimizer = "bobyqa")
AIC BIC logLik deviance df.resid
70592.4 70727.8 -35276.2 70552.4 6425
Scaled residuals:
Min 1Q Median 3Q Max
-6.7526 -0.4415 -0.1012 0.3834 4.7239
Random effects:
Groups Name Variance Std.Dev. Corr
player_idFactor (Intercept) 3856.96 62.104
ageCentered20 55.69 7.463 -0.58
Residual 2140.80 46.269
Number of obs: 6445, groups: player_idFactor, 1292
Fixed effects:
Estimate Std. Error df
(Intercept) -14.41747 26.71450 4443.49841
positionFactorQB 61.85599 28.07863 4206.42470
positionFactorRB 59.26821 27.68023 4341.88102
positionFactorTE 18.66483 27.89324 4337.42684
positionFactorWR 27.05937 27.35504 4381.59101
ageCentered20 6.92672 7.48505 5052.03481
ageCentered20Quadratic -0.89662 0.50892 4506.71231
years_of_experience 5.98527 1.05344 2059.25177
positionFactorQB:ageCentered20 7.15606 7.64964 4911.53510
positionFactorRB:ageCentered20 1.41150 7.76561 4923.15762
positionFactorTE:ageCentered20 -0.95441 7.73630 4933.84264
positionFactorWR:ageCentered20 5.38514 7.64072 4945.56253
positionFactorQB:ageCentered20Quadratic -0.47891 0.51694 4525.86739
positionFactorRB:ageCentered20Quadratic -0.62677 0.53771 4423.94494
positionFactorTE:ageCentered20Quadratic 0.06362 0.52872 4475.97907
positionFactorWR:ageCentered20Quadratic -0.65946 0.52493 4480.80773
t value Pr(>|t|)
(Intercept) -0.540 0.5894
positionFactorQB 2.203 0.0277 *
positionFactorRB 2.141 0.0323 *
positionFactorTE 0.669 0.5034
positionFactorWR 0.989 0.3226
ageCentered20 0.925 0.3548
ageCentered20Quadratic -1.762 0.0782 .
years_of_experience 5.682 1.52e-08 ***
positionFactorQB:ageCentered20 0.935 0.3496
positionFactorRB:ageCentered20 0.182 0.8558
positionFactorTE:ageCentered20 -0.123 0.9018
positionFactorWR:ageCentered20 0.705 0.4810
positionFactorQB:ageCentered20Quadratic -0.926 0.3543
positionFactorRB:ageCentered20Quadratic -1.166 0.2438
positionFactorTE:ageCentered20Quadratic 0.120 0.9042
positionFactorWR:ageCentered20Quadratic -1.256 0.2091
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
R2m R2c
[1,] 0.1650522 0.6694687
Code
positionFactor emmean SE df lower.CL upper.CL
FB 11.3 9.30 1351 -6.97 29.5
QB 94.3 4.65 1109 85.17 103.4
RB 47.3 3.65 1374 40.17 54.5
TE 27.1 3.73 1250 19.80 34.4
WR 38.9 2.85 1331 33.28 44.5
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Code
ageCentered20 emmean SE df lower.CL upper.CL
6.4 43.8 2.4 1308 39.1 48.5
Results are averaged over the levels of: positionFactor
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Code
ageCentered20Quadratic emmean SE df lower.CL upper.CL
51.5 43.8 2.4 1308 39.1 48.5
Results are averaged over the levels of: positionFactor
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Code
years_of_experience emmean SE df lower.CL upper.CL
4.6 43.8 2.4 1308 39.1 48.5
Results are averaged over the levels of: positionFactor
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Code
[1] 70592.39
11.3.4.3.3 Compare Models
After fitting several models, we now must compare their fit to determine which model fits “best” while also considering parsimony. Parsimonious models are more likely to be true and more likely to generalize to other samples, because more complex models are more likely to overfit the data. Thus, more complex models will almost always fit better than simpler models. Thus, we are not just interested in whether a more complex model fits better than the simpler model; we also care about whether the more complex model fits significantly better than the simpler model given its additional complexity. For evaluating and comparing models, we examine the likelihood ratio test, the Akaike Information Criterion (AIC), the corrected AIC (AICc), the Bayesian Information Criterion (BIC), \(R^2\), deviance, and log likelihood.
The BIC penalizes model complexity more than the AIC does. The BIC is preferable when there is a “true” model, and one intends to identify the true model. The AIC is preferable when we are concerned more about predictive accuracy and when overfitting is less of a concern. Because we are more concerned about predictive accuracy and we do not believe one of these models is the “true” model per se of age-related changes in fantasy performance, we will give more weight to AIC than BIC.
Below, we specify various groups of models for the model fit comparisons:
Code
lmVsMixedModel <- list(
"nullModel" = pointsPerSeason_nullModel,
"randomIntercepts" = pointsPerSeason_randomIntercepts
)
lmAndMixedModels <- list(
"nullModel" = pointsPerSeason_nullModel,
"linearRegression" = pointsPerSeason_linearRegression,
"quadraticRegression" = pointsPerSeason_quadraticRegression,
"randomIntercepts" = pointsPerSeason_randomIntercepts,
"position" = pointsPerSeason_position,
"fixedLinear" = pointsPerSeason_positionAgeFixedLinearSlopes,
"randomLinear" = pointsPerSeason_positionAgeRandomLinearSlopes,
"randomLinearFixedQuadratic" = pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopes,
"randomLinearFixedQuadraticInteraction" = pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteraction
)
mixedModels <- list(
"randomIntercepts" = pointsPerSeason_randomIntercepts,
"position" = pointsPerSeason_position,
"fixedLinear" = pointsPerSeason_positionAgeFixedLinearSlopes,
"randomLinear" = pointsPerSeason_positionAgeRandomLinearSlopes,
"randomLinearFixedQuadratic" = pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopes,
"randomLinearFixedQuadraticInteraction" = pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteraction
)
mixedModels1 <- list(
"randomIntercepts" = pointsPerSeason_randomIntercepts,
"position" = pointsPerSeason_position
)
mixedModels2 <- list(
"fixedLinear" = pointsPerSeason_positionAgeFixedLinearSlopes,
"randomLinear" = pointsPerSeason_positionAgeRandomLinearSlopes,
"randomLinearFixedQuadratic" = pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopes,
"randomLinearFixedQuadraticInteraction" = pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteraction
)11.3.4.3.3.1 Likelihood Ratio Test
Code
11.3.4.3.3.2 Akaike Information Criterion (AIC)
Code
AIC(
pointsPerSeason_nullModel,
pointsPerSeason_linearRegression,
pointsPerSeason_quadraticRegression,
pointsPerSeason_randomIntercepts,
pointsPerSeason_positionAgeFixedLinearSlopes,
pointsPerSeason_positionAgeRandomLinearSlopes,
pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopes,
pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteraction
) dAIC df
randomLinearFixedQuadraticInteraction 0.0 19
randomLinearFixedQuadratic 43.6 11
randomLinear 397.7 10
fixedLinear 621.1 8
quadraticRegression 3444.5 16
linearRegression 3455.9 11
position 77144.5 7
randomIntercepts 77422.2 3
nullModel 84097.8 2
11.3.4.3.3.3 Corrected Akaike Information Criterion (AICc)
Code
dAICc df
randomIntercepts 0.0 3
nullModel 6675.7 2
Code
dAICc df
position 0.0 7
randomIntercepts 277.7 3
dAICc df
randomLinearFixedQuadraticInteraction 0.0 19
randomLinearFixedQuadratic 43.5 11
randomLinear 397.6 10
fixedLinear 621.0 8
11.3.4.3.3.4 Bayesian Information Criterion (BIC)
Code
BIC(
pointsPerSeason_nullModel,
pointsPerSeason_linearRegression,
pointsPerSeason_quadraticRegression,
pointsPerSeason_randomIntercepts,
pointsPerSeason_positionAgeFixedLinearSlopes,
pointsPerSeason_positionAgeRandomLinearSlopes,
pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopes,
pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteraction
)Code
dBIC df
randomIntercepts 0.0 3
nullModel 6668.2 2
Code
dBIC df
position 0.0 7
randomIntercepts 247.6 3
dBIC df
randomLinearFixedQuadratic 0.0 11
randomLinearFixedQuadraticInteraction 10.5 19
randomLinear 347.3 10
fixedLinear 557.1 8
11.3.4.3.3.5 \(R^2\)
[1] 0
[1] 0.1422979
[1] 0.1451436
R2m R2c
[1,] 0 0.4847453
R2m R2c
[1,] 0.09922355 0.5386768
R2m R2c
[1,] 0.09797596 0.5835368
R2m R2c
[1,] 0.1655593 0.6746525
R2m R2c
[1,] 0.1582759 0.6751097
11.3.4.3.3.6 Deviance
[1] 73167060
[1] 36895690
[1] 36773276
[1] 148038
[1] 71226.93
[1] 70999.5
[1] 70643.47
[1] 70583.85
11.3.4.3.3.7 Log Likelihood
'log Lik.' -77357.85 (df=2)
'log Lik.' -37027.89 (df=11)
'log Lik.' -37017.18 (df=16)
'log Lik.' -74019.01 (df=3)
'log Lik.' -35613.46 (df=8)
'log Lik.' -35499.75 (df=10)
'log Lik.' -35321.74 (df=11)
'log Lik.' -35291.92 (df=19)
11.3.4.3.4 Generalized Additive Model
[1] 4
Code
pointsPerSeason_gam <- bam( # using bam() instead of gam() for faster estimation due to large size of data
fantasy_points ~ positionFactor + s(ageCentered20, by = positionFactor) + years_of_experience + s(player_idFactor, ageCentered20, bs = "re"),
data = player_stats_seasonal_offense_subset,
nthreads = num_cores
)
pointsPerSeason_gamSummary <- summary(pointsPerSeason_gam)
pointsPerSeason_gamSummary
Family: gaussian
Link function: identity
Formula:
fantasy_points ~ positionFactor + s(ageCentered20, by = positionFactor) +
years_of_experience + s(player_idFactor, ageCentered20, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.1108 10.2247 -0.695 0.48680
positionFactorQB 87.7506 10.7149 8.190 3.23e-16 ***
positionFactorRB 28.7960 10.2812 2.801 0.00511 **
positionFactorTE 12.6026 10.2924 1.224 0.22083
positionFactorWR 22.6299 9.9928 2.265 0.02357 *
years_of_experience 4.2443 0.9271 4.578 4.80e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(ageCentered20):positionFactorFB 1.761 2.221 2.488 0.0735 .
s(ageCentered20):positionFactorQB 5.143 6.310 34.597 <2e-16 ***
s(ageCentered20):positionFactorRB 4.892 5.905 35.700 <2e-16 ***
s(ageCentered20):positionFactorTE 4.142 5.146 14.293 <2e-16 ***
s(ageCentered20):positionFactorWR 5.288 6.300 39.727 <2e-16 ***
s(player_idFactor,ageCentered20) 880.138 1287.000 5.070 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.583 Deviance explained = 64.1%
fREML = 35673 Scale est. = 2784.9 n = 6445
[1] 0.5828168
R2m R2c
[1,] 0.5574201 0.5574201
[1] 70254.43
11.3.4.3.4.1 Compare Models
Code
linearMixedModelsVsGAM <- list(
"fixedLinear" = pointsPerSeason_positionAgeFixedLinearSlopes,
"randomLinear" = pointsPerSeason_positionAgeRandomLinearSlopes,
"randomLinearFixedQuadratic" = pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopes,
"randomLinearFixedQuadraticInteraction" = pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteraction,
"gam" = pointsPerSeason_gam
)
AIC(
pointsPerSeason_positionAgeFixedLinearSlopes,
pointsPerSeason_positionAgeRandomLinearSlopes,
pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopes,
pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteraction,
pointsPerSeason_gam
)Code
dAICc df
gam 0.0 910.4
randomLinearFixedQuadraticInteraction 67.7 19
randomLinearFixedQuadratic 111.2 11
randomLinear 465.3 10
fixedLinear 688.7 8
Code
Code
dBIC df
randomLinearFixedQuadratic 0.0 11
randomLinearFixedQuadraticInteraction 10.5 19
randomLinear 347.3 10
fixedLinear 557.1 8
gam 5678.5 910.4
[1] 0
[1] 0.1422979
[1] 0.1451436
R2m R2c
[1,] 0 0.4847453
R2m R2c
[1,] 0.09922355 0.5386768
R2m R2c
[1,] 0.09797596 0.5835368
R2m R2c
[1,] 0.1655593 0.6746525
R2m R2c
[1,] 0.1582759 0.6751097
R2m R2c
[1,] 0.5574201 0.5574201
[1] 73167060
[1] 36895690
[1] 36773276
[1] 148038
[1] 71226.93
[1] 70999.5
[1] 70643.47
[1] 70583.85
[1] 15421789
'log Lik.' -77357.85 (df=2)
'log Lik.' -37027.89 (df=11)
'log Lik.' -37017.18 (df=16)
'log Lik.' -74019.01 (df=3)
'log Lik.' -35613.46 (df=8)
'log Lik.' -35499.75 (df=10)
'log Lik.' -35321.74 (df=11)
'log Lik.' -35291.92 (df=19)
'log Lik.' -34216.86 (df=910.3565)
11.3.4.3.5 Players Who Were (at Least Once) at the Top of the End-of-Season Depth Chart
Code
pointsPerSeasonDepth_positionAgeRandomLinearFixedQuadraticSlopesInteractionExperience <- lmerTest::lmer(
fantasy_points ~ positionFactor + ageCentered20 + ageCentered20Quadratic + positionFactor:ageCentered20 + positionFactor:ageCentered20Quadratic + years_of_experience + (1 + ageCentered20 | player_idFactor),
data = player_stats_seasonal_offense_subset,
control = lmerControl(optimizer = "bobyqa")
)
summary(pointsPerSeasonDepth_positionAgeRandomLinearFixedQuadraticSlopesInteractionExperience)Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula:
fantasy_points ~ positionFactor + ageCentered20 + ageCentered20Quadratic +
positionFactor:ageCentered20 + positionFactor:ageCentered20Quadratic +
years_of_experience + (1 + ageCentered20 | player_idFactor)
Data: player_stats_seasonal_offense_subset
Control: lmerControl(optimizer = "bobyqa")
REML criterion at convergence: 70526.5
Scaled residuals:
Min 1Q Median 3Q Max
-6.7527 -0.4422 -0.1011 0.3827 4.7192
Random effects:
Groups Name Variance Std.Dev. Corr
player_idFactor (Intercept) 3892.71 62.392
ageCentered20 56.65 7.527 -0.58
Residual 2142.60 46.288
Number of obs: 6445, groups: player_idFactor, 1292
Fixed effects:
Estimate Std. Error df
(Intercept) -14.50932 26.75909 4424.09802
positionFactorQB 61.83156 28.12724 4186.26252
positionFactorRB 59.23635 27.72713 4322.39098
positionFactorTE 18.69491 27.94051 4317.87428
positionFactorWR 27.03245 27.40111 4362.14111
ageCentered20 6.96089 7.49658 5041.41649
ageCentered20Quadratic -0.89892 0.50977 4503.24842
years_of_experience 5.97746 1.05605 2050.13256
positionFactorQB:ageCentered20 7.16792 7.66153 4900.34818
positionFactorRB:ageCentered20 1.43053 7.77761 4912.87816
positionFactorTE:ageCentered20 -0.96261 7.74822 4923.54745
positionFactorWR:ageCentered20 5.40171 7.65247 4935.17920
positionFactorQB:ageCentered20Quadratic -0.48061 0.51780 4522.63658
positionFactorRB:ageCentered20Quadratic -0.62953 0.53863 4421.51275
positionFactorTE:ageCentered20Quadratic 0.06391 0.52961 4473.57721
positionFactorWR:ageCentered20Quadratic -0.66176 0.52582 4477.90547
t value Pr(>|t|)
(Intercept) -0.542 0.5877
positionFactorQB 2.198 0.0280 *
positionFactorRB 2.136 0.0327 *
positionFactorTE 0.669 0.5035
positionFactorWR 0.987 0.3239
ageCentered20 0.929 0.3532
ageCentered20Quadratic -1.763 0.0779 .
years_of_experience 5.660 1.72e-08 ***
positionFactorQB:ageCentered20 0.936 0.3495
positionFactorRB:ageCentered20 0.184 0.8541
positionFactorTE:ageCentered20 -0.124 0.9011
positionFactorWR:ageCentered20 0.706 0.4803
positionFactorQB:ageCentered20Quadratic -0.928 0.3534
positionFactorRB:ageCentered20Quadratic -1.169 0.2426
positionFactorTE:ageCentered20Quadratic 0.121 0.9040
positionFactorWR:ageCentered20Quadratic -1.259 0.2083
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
R2m R2c
[1,] 0.1648185 0.6708484
Code
positionFactor emmean SE df lower.CL upper.CL
FB 11.3 9.30 1333 -6.99 29.5
QB 94.2 4.65 1094 85.11 103.4
RB 47.3 3.65 1356 40.10 54.4
TE 27.1 3.73 1234 19.77 34.4
WR 38.8 2.85 1313 33.22 44.4
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Code
ageCentered20 emmean SE df lower.CL upper.CL
6.4 43.7 2.4 1290 39 48.4
Results are averaged over the levels of: positionFactor
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Code
ageCentered20Quadratic emmean SE df lower.CL upper.CL
51.5 43.7 2.4 1290 39 48.4
Results are averaged over the levels of: positionFactor
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Code
years_of_experience emmean SE df lower.CL upper.CL
4.6 43.7 2.4 1290 39 48.4
Results are averaged over the levels of: positionFactor
Degrees-of-freedom method: kenward-roger
Confidence level used: 0.95
Code
[1] 70566.51
Code
pointsPerSeasonDepth_gam <- bam( # using bam() instead of gam() for faster estimation due to large size of data
fantasy_points ~ positionFactor + s(ageCentered20, by = positionFactor) + years_of_experience + s(player_idFactor, ageCentered20, bs = "re"),
data = player_stats_seasonal_offense_subsetDepth,
nthreads = num_cores
)
pointsPerSeasonDepth_gamSummary <- summary(pointsPerSeasonDepth_gam)
pointsPerSeasonDepth_gamSummary
Family: gaussian
Link function: identity
Formula:
fantasy_points ~ positionFactor + s(ageCentered20, by = positionFactor) +
years_of_experience + s(player_idFactor, ageCentered20, bs = "re")
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.464 11.895 0.543 0.586866
positionFactorQB 112.659 12.225 9.215 < 2e-16 ***
positionFactorRB 52.338 11.810 4.431 9.58e-06 ***
positionFactorTE 27.058 11.954 2.264 0.023648 *
positionFactorWR 41.742 11.430 3.652 0.000263 ***
years_of_experience 1.072 1.181 0.908 0.363798
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(ageCentered20):positionFactorFB 1.554 1.931 0.635 0.461
s(ageCentered20):positionFactorQB 4.974 6.127 19.094 < 2e-16 ***
s(ageCentered20):positionFactorRB 4.149 5.127 18.527 < 2e-16 ***
s(ageCentered20):positionFactorTE 3.761 4.711 7.432 2.17e-06 ***
s(ageCentered20):positionFactorWR 4.783 5.802 22.841 < 2e-16 ***
s(player_idFactor,ageCentered20) 600.644 780.000 6.092 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.564 Deviance explained = 61.7%
fREML = 28625 Scale est. = 3193.5 n = 5121
[1] 0.564187
R2m R2c
[1,] 0.5405308 0.5405308
[1] 56444.14
11.3.5 Plots of Model-Implied Fantasy Points by Position and Age
Code
# From Quadratic Model: All Players
pointsPerSeason_positionAge_newData$fantasyPoints_quadratic <- predict(
object = pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteractionExperience,
newdata = pointsPerSeason_positionAge_newData,
re.form = NA
)
# From Quadratic Model: Players at Top of End-of-Season Depth Chart
pointsPerSeason_positionAge_newData$fantasyPoints_depthQuadratic <- predict(
object = pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteractionExperience,
newdata = pointsPerSeason_positionAge_newData,
re.form = NA
)
# From GAM Model: All Players
pointsPerSeason_positionAge_newData$fantasyPoints_gam <- predict(
object = pointsPerSeason_gam,
newdata = pointsPerSeason_positionAge_newData,
newdata.guaranteed = TRUE,
exclude = "s(player_idFactor,ageCentered20)"
)
# From GAM Model: Players at Top of End-of-Season Depth Chart
pointsPerSeason_positionAge_newData$fantasyPoints_depthGAM <- predict(
object = pointsPerSeasonDepth_gam,
newdata = pointsPerSeason_positionAge_newData,
newdata.guaranteed = TRUE,
exclude = "s(player_idFactor,ageCentered20)"
)Plots of model-implied fantasy points by position and age are in Figures 11.20–11.23.
11.3.5.1 Quadratic Model
Code
ggplot2::ggplot(
data = pointsPerSeason_positionAge_newData,
mapping = aes(
x = age,
y = fantasyPoints_quadratic,
color = positionFactor
)
) +
geom_smooth() +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age and Position",
subtitle = "Quadratic Model with All Players"
) +
theme_classic()
11.3.5.2 Quadratic Model: Top of Depth Chart
Code
ggplot2::ggplot(
data = pointsPerSeason_positionAge_newData,
mapping = aes(
x = age,
y = fantasyPoints_depthQuadratic,
color = positionFactor
)
) +
geom_smooth() +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age and Position",
subtitle = "Quadratic Model with Players Who Were Once at Top of End-of-Season Depth Chart"
) +
theme_classic()
11.3.5.3 Generalized Additive Model
Code
ggplot2::ggplot(
data = pointsPerSeason_positionAge_newData,
mapping = aes(
x = age,
y = fantasyPoints_gam,
color = positionFactor
)
) +
geom_smooth() +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age and Position",
subtitle = "Generalized Additive Model with All Players"
) +
theme_classic()
11.3.5.4 Generalized Additive Model: Top of Depth Chart
Code
ggplot2::ggplot(
data = pointsPerSeason_positionAge_newData,
mapping = aes(
x = age,
y = fantasyPoints_depthGAM,
color = positionFactor
)
) +
geom_smooth() +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age and Position",
subtitle = "Generalized Additive Model with Players Who Were Once at Top of End-of-Season Depth Chart"
) +
theme_classic()
11.3.6 Plots of Individuals’ Model-Implied Fantasy Points by Age
Code
player_stats_seasonal_offense_subsetCC$fantasyPoints_quadratic <- predict(
object = pointsPerSeason_positionAgeRandomLinearFixedQuadraticSlopesInteractionExperience,
newdata = player_stats_seasonal_offense_subsetCC
)
player_stats_seasonal_offense_subsetCC$fantasyPoints_gam <- predict(
object = pointsPerSeason_gam,
newdata = player_stats_seasonal_offense_subsetCC
)Code
zeroAge <- pointsPerSeason_positionAge_newData %>%
group_by(positionFactor) %>%
filter(fantasyPoints_gam < 0) %>%
slice(which.min(age))
peakAge <- pointsPerSeason_positionAge_newData %>%
group_by(positionFactor) %>%
slice(which.max(fantasyPoints_gam))
peakAge2 <- pointsPerSeason_positionAge_newData %>%
filter(age > 22) %>%
group_by(positionFactor) %>%
slice(which.max(fantasyPoints_gam))
qbPeakAge <- round(peakAge$age[which(peakAge$positionFactor == "QB")], 0)
fbPeakAge <- round(peakAge$age[which(peakAge$positionFactor == "FB")], 0)
rbPeakAge <- round(peakAge$age[which(peakAge$positionFactor == "RB")], 0)
wrPeakAge <- round(peakAge$age[which(peakAge$positionFactor == "WR")], 0)
wrPeakAge2 <- round(peakAge2$age[which(peakAge$positionFactor == "WR")], 0)
tePeakAge <- round(peakAge$age[which(peakAge$positionFactor == "TE")], 0)
qbZeroAge <- round(zeroAge$age[which(zeroAge$positionFactor == "QB")], 0)
fbZeroAge <- round(zeroAge$age[which(zeroAge$positionFactor == "FB")], 0)
rbZeroAge <- round(zeroAge$age[which(zeroAge$positionFactor == "RB")], 0)
wrZeroAge <- round(zeroAge$age[which(zeroAge$positionFactor == "WR")], 0)
teZeroAge <- round(zeroAge$age[which(zeroAge$positionFactor == "TE")], 0)11.3.6.1 Quarterbacks
A plot of Quarterbacks’ model-implied fantasy points by age is in Figure 11.24. The model-implied peak of Quarterbacks’ fantasy points is at age 20. The model-predicted value of zero fantasy points for Quarterbacks is at 36.
Code
plot_individualFantasyPointsByAgeQB <- ggplot(
data = player_stats_seasonal_offense_subsetCC %>% filter(position == "QB"),
mapping = aes(
x = age,
y = fantasyPoints_gam,
group = player_id)) +
geom_smooth(
aes(
x = age,
y = fantasyPoints_gam,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
se = FALSE,
linewidth = 0.5,
color = "black") +
geom_smooth(
mapping = aes(
x = age,
y = fantasyPoints_gam
),
data = pointsPerSeason_positionAge_newData %>% filter(positionFactor == "QB"),
inherit.aes = FALSE,
se = TRUE,
linewidth = 2
) +
geom_point(
aes(
x = age,
y = fantasyPoints_gam,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
size = 1,
color = "transparent" # make points invisible but keep tooltips
) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Quarterbacks"
) +
theme_classic()
ggplotly(
plot_individualFantasyPointsByAgeQB,
tooltip = c("age","fantasyPoints_gam","text","label")
)11.3.6.2 Fullbacks
A plot of Fullbacks’ model-implied fantasy points by age is in Figure 11.25. The model-implied peak of Fullbacks’ fantasy points is at age 26. The model-predicted value of zero fantasy points for Fullbacks is at 32.
Code
plot_individualFantasyPointsByAgeFB <- ggplot(
data = player_stats_seasonal_offense_subsetCC %>% filter(position == "FB"),
mapping = aes(
x = age,
y = fantasyPoints_gam,
group = player_id)) +
geom_smooth(
aes(
x = age,
y = fantasyPoints_gam,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
se = FALSE,
linewidth = 0.5,
color = "black") +
geom_smooth(
mapping = aes(
x = age,
y = fantasyPoints_gam
),
data = pointsPerSeason_positionAge_newData %>% filter(positionFactor == "FB"),
inherit.aes = FALSE,
se = TRUE,
linewidth = 2
) +
geom_point(
aes(
x = age,
y = fantasyPoints_gam,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
size = 1,
color = "transparent" # make points invisible but keep tooltips
) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Fullbacks"
) +
theme_classic()
ggplotly(
plot_individualFantasyPointsByAgeFB,
tooltip = c("age","fantasyPoints_gam","text","label")
)11.3.6.3 Running Backs
A plot of Running Backs’ model-implied fantasy points by age is in Figure 11.26. The model-implied peak of Running Backs’ fantasy points is at age 20. The model-predicted value of zero fantasy points for Running Backs is at 30.
Code
plot_individualFantasyPointsByAgeRB <- ggplot(
data = player_stats_seasonal_offense_subsetCC %>% filter(position == "RB"),
mapping = aes(
x = age,
y = fantasyPoints_gam,
group = player_id)) +
geom_smooth(
aes(
x = age,
y = fantasyPoints_gam,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
se = FALSE,
linewidth = 0.5,
color = "black") +
geom_smooth(
mapping = aes(
x = age,
y = fantasyPoints_gam
),
data = pointsPerSeason_positionAge_newData %>% filter(positionFactor == "RB"),
inherit.aes = FALSE,
se = TRUE,
linewidth = 2
) +
geom_point(
aes(
x = age,
y = fantasyPoints_gam,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
size = 1,
color = "transparent" # make points invisible but keep tooltips
) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Running Backs"
) +
theme_classic()
ggplotly(
plot_individualFantasyPointsByAgeRB,
tooltip = c("age","fantasyPoints_gam","text","label")
)11.3.6.4 Wide Receivers
A plot of Wide Receivers’ model-implied fantasy points by age is in Figure 11.27. The model-implied peaks of Wide Receivers’ fantasy points are at ages 20 and 26. The model-predicted value of zero fantasy points for Wide Receivers is at 30.
Code
plot_individualFantasyPointsByAgeWR <- ggplot(
data = player_stats_seasonal_offense_subsetCC %>% filter(position == "WR"),
mapping = aes(
x = age,
y = fantasyPoints_gam,
group = player_id)) +
geom_smooth(
aes(
x = age,
y = fantasyPoints_gam,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
se = FALSE,
linewidth = 0.5,
color = "black") +
geom_smooth(
mapping = aes(
x = age,
y = fantasyPoints_gam
),
data = pointsPerSeason_positionAge_newData %>% filter(positionFactor == "WR"),
inherit.aes = FALSE,
se = TRUE,
linewidth = 2
) +
geom_point(
aes(
x = age,
y = fantasyPoints_gam,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
size = 1,
color = "transparent" # make points invisible but keep tooltips
) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Wide Receivers"
) +
theme_classic()
ggplotly(
plot_individualFantasyPointsByAgeWR,
tooltip = c("age","fantasyPoints_gam","text","label")
)11.3.6.5 Tight Ends
A plot of Tight Ends’ model-implied fantasy points by age is in Figure 11.28. The model-implied peak of Tight Ends’ fantasy points is at age 26. The model-predicted value of zero fantasy points for Tight Ends is at 32.
Code
plot_individualFantasyPointsByAgeTE <- ggplot(
data = player_stats_seasonal_offense_subsetCC %>% filter(position == "TE"),
mapping = aes(
x = age,
y = fantasyPoints_gam,
group = player_id)) +
geom_smooth(
aes(
x = age,
y = fantasyPoints_gam,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
se = FALSE,
linewidth = 0.5,
color = "black") +
geom_smooth(
mapping = aes(
x = age,
y = fantasyPoints_gam
),
data = pointsPerSeason_positionAge_newData %>% filter(positionFactor == "TE"),
inherit.aes = FALSE,
se = TRUE,
linewidth = 2
) +
geom_point(
aes(
x = age,
y = fantasyPoints_gam,
text = player_display_name, # add player name for mouse over tooltip
label = season # add season for mouse over tooltip
),
size = 1,
color = "transparent" # make points invisible but keep tooltips
) +
labs(
x = "Player Age (years)",
y = "Fantasy Points (Season)",
title = "Fantasy Points (Season) by Player Age: Tight Ends"
) +
theme_classic()
ggplotly(
plot_individualFantasyPointsByAgeTE,
tooltip = c("age","fantasyPoints_gam","text","label")
)11.3.7 Summary of Findings
We applied mixed models with random intercepts and random slopes to allow our model estimates to account for the fact that different players have different starting points (intercepts) and different changes over time (slopes) in fantasy points. A quadratic, inverted-U-shaped form as a function of age fit better than a linear form as a function of age in predicting players’ fantasy points. A generalized additive model that allowed further nonlinearity fit even better than the quadratic model.
Based on the bivariate scatterplots between age and fantasy points, we might conclude that players tend to stay stable or even increase in fantasy points with age. However, this conclusion would be wrong. When we account for the longitudinal data (i.e., multiple observations over time for the same player) using mixed models, we observe that fantasy points tend to decrease with age, with the timing and rate of decline differing for each position. In other words, the association between age and fantasy points differs at the person level versus the group level. This is an example of Simpson’s paradox.
The discrepancy between the positive or null association between age and players’ fantasy points at the group level, and the negative association at the person level may be due, in part, to the selective attrition of players with age. The players who play the longest will tend to be the highest performing players, whereas the poorest performing players will retire or get dropped from the team at younger ages. Thus, the selective attrition of weaker players may make it seem that there is no association between age and performance (or even a positive one!), when in fact, players’ performance tends to decrease with age after age 26 or so (with the timing differing from position to position), until the player eventually retires or is dropped from the team. Selective attrition is common in longitudinal studies (such as this one) and intervention studies. For instance, attrition may be more likely for individuals from lower socioeconomic status backgrounds because they may face more challenges in continuing in longitudinal studies such as fewer financial resources, greater life stressors, etc. In addition, people who experience side effects or lack of improvement may be more likely to drop out of intervention studies. Examining only those who completed treatment (an example of selection bias) would make the intervention look more effective than it actually was because the people who stay in the study are those who experience the greatest improvement. Thus, it is important to use approaches such as mixed models or other approaches that account for the multiple observations from the same person, that use all available information, and that do not exclude people who do not complete all portions of the study.
11.4 Conclusion
Mixed models allow accounting for multiple levels or units of analysis and to include both fixed and random effects. Inclusion of random effects allows the association between the predictor variables (the intercept and age) and the outcome variable (fantasy points) to differ for each individual in the grouping level (in this case, each player). This allows for more accurately predicting phenomena. Based on the bivariate scatterplots between age and fantasy points, we might conclude that players tend to stay stable or even increase in fantasy points with age. However, this conclusion would be wrong. When we account for the longitudinal data using mixed models, we observe that players’ fantasy points tend to decrease with age, with the timing and rate of decline differing for each position. In other words, the association between age and fantasy points differs at the person level versus the group level, which is an example of Simpson’s paradox. In sum, mixed models are valuable for examining associations between variables when there are multiple levels of data (i.e., multiple observations within the same unit, known as clustering or nesting). It is important not to confuse the association at one level (e.g., group level) with the association at another level (e.g., person level), which is an example of the ecological fallacy.
11.5 Session Info
R version 4.4.1 (2024-06-14)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 22.04.4 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
time zone: UTC
tzcode source: system (glibc)
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4
[5] purrr_1.0.2 readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
[9] tidyverse_2.0.0 viridis_0.6.5 viridisLite_0.4.2 plotly_4.10.4
[13] ggplot2_3.5.1 bbmle_1.0.25.1 AICcmodavg_2.3-3 mgcv_1.9-1
[17] nlme_3.1-164 sjstats_0.19.0 emmeans_1.10.3 MuMIn_1.48.4
[21] lmerTest_3.1-3 lme4_1.1-35.5 Matrix_1.7-0 petersenlab_1.0.3
loaded via a namespace (and not attached):
[1] DBI_1.2.3 mnormt_2.1.1 gridExtra_2.3
[4] rlang_1.1.4 magrittr_2.0.3 compiler_4.4.1
[7] vctrs_0.6.5 reshape2_1.4.4 quadprog_1.5-8
[10] pkgconfig_2.0.3 fastmap_1.2.0 backports_1.5.0
[13] labeling_0.4.3 pbivnorm_0.6.0 utf8_1.2.4
[16] rmarkdown_2.27 tzdb_0.4.0 nloptr_2.1.1
[19] xfun_0.46 jsonlite_1.8.8 VGAM_1.1-11
[22] psych_2.4.6.26 broom_1.0.6 lavaan_0.6-18
[25] cluster_2.1.6 R6_2.5.1 stringi_1.8.4
[28] RColorBrewer_1.1-3 boot_1.3-30 rpart_4.1.23
[31] numDeriv_2016.8-1.1 estimability_1.5.1 Rcpp_1.0.13
[34] knitr_1.48 base64enc_0.1-3 timechange_0.3.0
[37] splines_4.4.1 nnet_7.3-19 tidyselect_1.2.1
[40] rstudioapi_0.16.0 yaml_2.3.10 lattice_0.22-6
[43] plyr_1.8.9 withr_3.0.0 coda_0.19-4.1
[46] evaluate_0.24.0 foreign_0.8-86 survival_3.6-4
[49] pillar_1.9.0 checkmate_2.3.1 insight_0.20.2
[52] generics_0.1.3 mix_1.0-12 hms_1.1.3
[55] munsell_0.5.1 scales_1.3.0 minqa_1.2.7
[58] xtable_1.8-4 glue_1.7.0 unmarked_1.4.1
[61] Hmisc_5.1-3 lazyeval_0.2.2 tools_4.4.1
[64] data.table_1.15.4 mvtnorm_1.2-5 grid_4.4.1
[67] mitools_2.4 crosstalk_1.2.1 bdsmatrix_1.3-7
[70] datawizard_0.12.2 colorspace_2.1-1 performance_0.12.2
[73] htmlTable_2.4.3 Formula_1.2-5 cli_3.6.3
[76] fansi_1.0.6 gtable_0.3.5 digest_0.6.36
[79] pbkrtest_0.5.3 farver_2.1.2 htmlwidgets_1.6.4
[82] htmltools_0.5.8.1 lifecycle_1.0.4 httr_1.4.7
[85] MASS_7.3-60.2